This uses pointblank to create a data validation report. In the resulting table at the end, any failing tests should have a CSV button that lets you download a .csv file of just the rows of data that don’t pass that particular validation step.

Check missing values

Action levels

By default, warn if 1 or more rows fail conditions and error if 2% or more fail. Some checks are run with a stricter action level that errors if any rows fail.

al_default <-  action_levels(warn_at = 1, stop_at = 0.02) #warn if even row fails, error if 2% of rows fail
al_strict <- action_levels(stop_at = 1) #error if even one row fails

Data Validation

The two datasets being submitted with the data paper are HDP_plots.csv and HDP_1997_2009.csv

Checks for data type, range, and duplicates

Pointblank Validation
Data Validation

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
NA 1
 col_vals_in_set()

subplot

A1, A2, A3, A4, A5, A6, A7, A8, A9, A10, B1, B2, B3, B4, B5, B6, B7, B8, B9, B10, C1, C2, C3, C4, C5, C6, C7, C8, C9, C10, D1, D2, D3, D4, D5, D6, D7, D8, D9, D10, E1, E2, E3, E4, E5, E6, E7, E8, E9, E10, F1, F2, F3, F4, F5, F6, F7, F8, F9, F10, G1, G2, G3, G4, G5, G6, G7, G8, G9, G10, H1, H2, H3, H4, H5, H6, H7, H8, H9, H10, I1, I2, I3, I4, I5, I6, I7, I8, I9, I10, J1, J2, J3, J4, J5, J6, J7, J8, J9, J10

67K 67K
1
0
0

NA 2
 col_vals_in_set()

plot

CF-1, CF-2, CF-3, CF-4, CF-5, CF-6, FF-1, FF-2, FF-3, FF-4, FF-5, FF-6, FF-7

67K 67K
1
0
0

NA 3

Height is measured to nearest cm

col_vals_expr()

ht%%1 == 0

57K 57K
1
0
0

NA 4

Shoots is interger

col_vals_expr()

shts%%1 == 0

57K 57K
1
0
0

NA 5

Number of inflorescences is integer

col_vals_expr()

infl%%1 == 0

2K 2K
1
0
0

NA 6

shoots between 0 and 20

col_vals_between()

shts

[0, 20]

67K 67K
1
8
0

NA 7

height between 0 and 200cm

col_vals_between()

ht

[0, 200]

67K 67K
1
2
0

NA 8

infloresences between 0 and 3

col_vals_between()

infl

[0, 3]

67K 67K
1
15
0

NA 9

duplicated rows

rows_distinct()

NA

67K 67K
1
0
0

NA 10
 col_vals_not_null()

plant_id

NA

67K 67K
1
0
0

NA 11

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

3K 3K
1
0
0

NA 12

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

4K 4K
1
0
0

NA 13

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

5K 5K
1
0
0

NA 14

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 15

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 16

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 17

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 18

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 19

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

7K 7K
1
0
0

NA 20

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

5K 5K
1
0
0

NA 21

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

NA 22

Check for duplicate ID's within each year

rows_distinct()

plant_id

NA

6K 6K
1
0
0

2022-12-06 17:18:14 UTC 4.7 s 2022-12-06 17:18:18 UTC

Year to year change

Checks that year to year change in size is reasonable

Pointblank Validation
Check growth & regression

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
NA 1

|% change in height| < 200%

col_vals_lt()

ht_pc

2

67K 66K
1
422
0

NA 2

|∆ height| < 100cm

col_vals_between()

ht_diff

[−100, 100]

67K 67K
1
11
0

NA 3

|∆ shoot number| < 5

col_vals_between()

shts_diff

[−5, 5]

67K 67K
1
201
0

2022-12-06 17:18:22 UTC < 1 s 2022-12-06 17:18:22 UTC

Seedlings

Check that size of seedlings is reasonable

Pointblank Validation
Check seedlings

tibbleWARN 1 STOP 0.02 NOTIFY
STEP COLUMNS VALUES TBL EVAL UNITS PASS FAIL W S N EXT
NA 1

shoots < 3

col_vals_lt()

shts

3

3K 3K
1
12
0

NA 2

height < 30cm

col_vals_lt()

ht

30

3K 3K
1
3
0

2022-12-06 17:18:24 UTC < 1 s 2022-12-06 17:18:24 UTC